18 research outputs found

    Bandit Algorithms for Tree Search

    Get PDF
    Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go (Gelly et al., 2006). The UCT algorithm (Kocsis and Szepesvari, 2006), a tree search method based on Upper Confidence Bounds (UCB) (Auer et al., 2002), is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is too ``optimistic'' in some cases, leading to a regret O(exp(exp(D))) where D is the depth of the tree. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially with the horizon depth is proven to have a regret O(2^D \sqrt{n}), but does not adapt to possible smoothness in the tree. We then analyze Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce a UCB-based Bandit Algorithm for Smooth Trees which takes into account actual smoothness of the rewards for performing efficient ``cuts'' of sub-optimal branches with high confidence. Finally, we present an incremental tree search version which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, essentially only the optimal branches is indefinitely developed. We illustrate these methods on a global optimization problem of a Lipschitz function, given noisy data

    Sensitivity analysis in HMMs with application to likelihood maximization

    Get PDF
    International audienceThis paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments

    Particle filter-based policy gradient for pomdps

    Get PDF
    International audienceOur setting is a Partially Observable Markov Decision Process with continuous state, observation and action spaces. Decisions are based on a Particle Filter for estimating the belief state given past observations. We consider a policy gradient approach for parameterized policy optimization. For that purpose, we investigate sensitivity analysis of the performance measure with respect to the parameters of the policy, focusing on Finite Difference (FD) techniques. We show that the naive FD is subject to variance explosion because of the non-smoothness of the resampling procedure. We propose a more sophisticated FD method which overcomes this problem and establish its consistency

    Optimal Policies Search for Sensor Management : Application to the AESA Radar

    Get PDF
    This report introduces a new approach to solve sensor management problems. Classically sensor management problems are formalized as Partially-Observed Markov Decision Process (POMPD). Our original approach consists in deriving the optimal parameterized policy based on stochastic gradient estimation. Two differents techniques nammed Infinitesimal Approximation (IPA) and Likelihood Ratio (LR) can be used to adress such a problem. This report discusses how these methods can be used for gradient estimation in the context of sensor management . The effectiveness of this general framework is illustrated by the managing of an Active Electronically Scanned Array Radar (AESA Radar)

    Optimal Policies Search for Sensor Management

    Get PDF
    International audienceThis paper introduces a new approach to solve sensor management problems. Classically sensor management problems can be well formalized as Partially-Observed Markov Decision Processes (POMPD). The original approach developped here consists in deriving the optimal parameterized policy based on a stochastic gradient estimation. We assume in this work that it is possible to learn the optimal policy off-line (in simulation ) using models of the environement and of the sensor(s). The learned policy can then be used to manage the sensor(s). In order to approximate the gradient in a stochastic context, we introduce a new method to approximate the gradient, based on Infinitesimal Perturbation Approximation (IPA). The effectiveness of this general framework is illustrated by the managing of an Electronically Scanned Array Radar. First simulations results are finally proposed

    Numerical methods for sensitivity analysis of Feynman-Kac models

    Get PDF
    The aim of this work is to provide efficient numerical methods to estimate the gradient of a Feynman-Kac flow with respect to a parameter of the model. The underlying idea is to view a Feynman-Kac flow as an expectation of a product of potential functions along a canonical Markov chain, and to use usual techniques of gradient estimation in Markov chains. Combining this idea with the use of interacting particle methods enables us to obtain two new algorithms that provide tight estimations of the sensitivity of a Feynman-Kac flow. Each algorithm has a linear computational complexity in the number of particles and is demonstrated to be asymptotically consistent. We also carefully analyze the differences between these new algorithms and existing ones. We provide numerical experiments to assess the practical efficiency of the proposed methods and explain how to use them to solve a parameter estimation problem in Hidden Markov Models. To conclude we can say that these algorithms outperform the existing ones in terms of trade-off between computational complexity and estimation quality

    A Dynamic Programming Approach to Viability Problems

    Get PDF
    International audienceViability theory considers the problem of maintaining a system under a set of viability constraints. The main tool for solving viability problems lies in the construction of the {\em viability kernel}, defined as the set of initial states from which there exists a trajectory that remains in the set of constraints indefinitely. The theory is very elegant and appears naturally in many applications. Unfortunately, the current numerical approaches suffer from low computational efficiency, which limits the potential range of applications of this domain. In this paper we show that the viability kernel is the zero-level set of a related dynamic programming problem, which opens promising research directions for numerical approximation of the viability kernel using tools from approximate dynamic programming. We illustrate the approach using k-nearest neighbors on a toy problem in two dimensions and on a complex dynamical model for anaerobic digestion process in four dimensions
    corecore